2  Introduction to Sport Data Analytics (1.1)

2.1 Learning Outcomes

By the end of this tutorial, you should:

  • have a basic understanding of the term sport data analytics

  • understand the historical context in which sport data analytics emerged

  • be familiar with some of the key debates within sport data analytics

2.2 Reading

Before this tutorial, you should access and review the following key papers:

  • Morgulev, Elia, Ofer H. Azar, and Ronnie Lidor. 2018. “Sports Analytics and the Big-Data Era.” International Journal of Data Science and Analytics 5 (4): 213–22 [1]

  • Kohe, Geoffery Z., and Laura G. Purdy. 2019. “Analytical Attractions and the Techno-Continuum: Conceptualising Data Obsessions and Consequences in Elite Sport.” Sport, Education and Society 24 (7): 742–55.[2]

  • Williams, Shaun, and Andrew Manley. 2016. “Elite Coaching and the Technocratic Engineer: Thanking the Boys at Microsoft!” Sport, Education and Society 21 (6): 828–50.[3]

There are direct links to these papers via the library reading list.

2.3 Defining the Field

What exactly is ‘sport data analytics’? Simply put, it’s the application of data science techniques to sport, with the goal of improving various aspects of decision-making and performance. It involves the collection, organisation, and interpretation of (usually) large amounts of sport-related data.

In sport data analytics, techniques such as predictive modelling, machine learning, and statistical analysis are utilised to extract meaningful insights from this data.

These insights might relate to player performance, team dynamics, injury prevention, or even fan engagement. The breadth and depth of sport data analytics extend from the level of individual players or athletes to teams, clubs, or leagues.

2.4 Current Role

Currently, the importance of data analytics in sport cannot be overstated. We live in an era where decision-making - whether it’s about player selection, in-game strategy, or injury management - is becoming increasingly data-driven [4].

Coaches, players, managers, and even fans are now relying on data analytics to gain a competitive edge [5]. This growing demand stems from the realisation that data can provide objective insights that can supplement, and in some cases supersede, traditional subjective judgments. It’s about turning raw data into actionable knowledge, thereby enhancing performance, mitigating risk, and optimising training and in-game strategy.

2.5 A Brief History

Looking back over the use of analytical approaches in sport, we can identify the roots of current practice in the middle of the 20th century. Early adopters of performance analytics used rudimentary statistics to assess player performance [6], [7].

However, a notable breakthrough came in the 1970s with the advent of ‘Sabermetrics’ in baseball [8]. This new approach, pioneered by Bill James, prioritised statistical analysis over traditional scouting methods to evaluate player performance [9].

The turn of the millennium saw a dramatic acceleration in the use of analytics, driven by advancements in technology [10]. The advent of new data collection methods, such as wearable technology and video tracking, resulted in an explosion of data available for analysis.

Simultaneously, the evolution of data processing capabilities and analytical techniques made it possible to analyse this wealth of data in a meaningful way [11].

Today, sport data analytics is an integral part of the sporting industry, present in almost all sports and influencing decisions at all levels. Its significance will continue to grow as technology advances, making it an exciting field to be part of [12].

Despite its relatively short history, the impact of analytics in sports is undeniably transformative and is set to shape the future of sport in ways we are only beginning to comprehend.

2.6 Key Debates

2.6.1 Introduction

As noted above sport data analytics, coming as it does at the intersection of sport and technology, has emerged as a transformative field in recent years [1].

In the following section we’ll examine some of the key debates that shape the world of sport data analytics, providing a broad perspective on the field’s most significant considerations and challenges.

In the era before data analytics, decisions in sport—ranging from player selection to game strategies—were primarily driven by qualitative insights and gut feelings [13].

The advent of data analytics has brought a seismic shift in this landscape. The introduction of statistics, predictive modeling, and machine learning has enabled deeper insights into player performance, injury prevention, and even fan engagement [14].

However, the transition to this data-driven approach has generated several debates with which you should be familiar.

2.6.2 Qualitative vs Quantitative Analysis

One of the most prominent debates revolves around qualitative versus quantitative analysis. [15].

While qualitative analysis is driven by subjective judgments and observations, quantitative analysis leans on numbers and statistical methods. Both approaches offer unique advantages and drawbacks.

Qualitative analysis allows for the inclusion of nuanced, contextual insights, while quantitative analysis offers measurable, objective data points. The key question in this debate is how to find the ideal balance between these two approaches for optimal decision-making.

2.6.3 Player Privacy vs Data Collection

The desire to collect and analyse data is now central to sport data analytics, but it inevitably leads us to confront ethical concerns about player privacy [16]. Today, data can be collected on everything from a player’s on-field performance to their heart rate, sleep patterns, and dietary habits.

While this granular level of detail can significantly enhance analytic capabilities, it also poses questions about privacy. How do we maintain the necessary data collection while respecting players’ privacy rights? This question remains a critical topic of discussion that we will return to across the module[17].

2.6.4 Traditional Scouting vs Data-Driven Approaches

Traditional scouting, which has long been the foundation of player selection, is under scrutiny in the age of data analytics [18]. Data-driven approaches allow for more detailed assessments of a player’s potential and performance, but can they replace the subjective insight of experienced scouts? The debate continues to challenge the sports world, prompting a search for a coexistence strategy that blends the best of both worlds.

2.6.5 Subjectivity vs Objectivity in Sport Analysis

Hand-in-hand with the above debate is the discussion around subjectivity versus objectivity. Data analytics, by nature, lends itself to objective analysis, eliminating personal bias and enhancing fairness in decision-making.

However, subjective human judgement, based on experience and intuition, has been an integral part of sports analysis for a long time [19]. Striking the perfect balance between the two is a topic of ongoing debate.

2.6.7 Conclusion

You’ll have realised that the world of sport data analytics is dynamic, with many ongoing debates and emerging trends that will continue to shape its future (and which you’ll be part of). As we move on in this module, we’ll examine some of these debates in more detail, understanding their implications, challenges, and coming up with potential solutions.

Remember, the essence of these discussions lies in a balancing act – combining traditional and more contemporary methods, blending qualitative insights with quantitative data, and ensuring that the protection of player privacy and ethical data collection remains a priority.

2.7 ‘Data Analytics’ and ‘Data Science’

One of the questions that sometimes arises in this area is ‘what is the difference between data analytics and data science?’. Data analytics and data science are related fields, but they are different in terms of their scope, focus, and techniques used.

2.7.1 Scope

Data analytics is usually considered to be a subfield of data science. It is focused on extracting meaningful insights from data using various analytical techniques.

Data science, on the other hand, is a broader, interdisciplinary field that encompasses data analytics, but also includes other areas such as data engineering, data storage, and data processing.

2.7.2 Focus

The primary goal of data analytics is to identify trends, patterns, and relationships within the data to support decision-making.

The primary goal of data science is to generate actionable insights and build data-driven products or solutions, often leveraging advanced techniques and algorithms.

2.7.3 Techniques

Data analytics primarily involves descriptive, predictive, and prescriptive analytics, using statistical methods, visualization techniques, and basic machine learning algorithms.

Data science combines techniques from various disciplines, including statistics, machine learning, artificial intelligence, computer science, and domain-specific knowledge, to process, analyze, and interpret complex datasets.

2.7.4 Skills

Data analysts often have a strong background in mathematics, statistics, and domain-specific knowledge. They typically use tools like Excel, SQL, and programming languages like R or Python for data manipulation and analysis.

Data scientists typically have strong programming skills, expertise in machine learning and AI, and knowledge of big data tools and platforms like Hadoop and Spark. They often have advanced degrees in fields like computer science, statistics, or applied mathematics.

2.7.5 Summary

In summary, data analytics is a narrower field focused on analyzing data to derive insights, while data science is a broader discipline that encompasses data analytics and additional aspects like data engineering, advanced machine learning, and AI techniques.

2.8 Questions for Discussion

  • How do you perceive the impact of data analytics on traditional practices in sport such as player selection and game strategies? Can you identify potential benefits and drawbacks associated with the transition to a more data-driven approach?

  • Discuss the balance between data collection for performance analysis and player privacy. How can the sporting industry ensure ethical data collection practices while still acquiring necessary performance information? What might be some potential solutions or compromises?

  • What are the key differences between data analytics and data science, in terms of scope, focus, and techniques? How do these differences affect their applications in real-world scenarios? In what ways do these two fields complement each other, and how might their roles evolve in the future?